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POSTERIOR  INFERENCE  IN  CURVED   EXPONENTIAL 
FAMILIES   UNDER  INCREASING   DIMENSIONS 

By  Alexandre  Belloni* 

IBM  T.J.  Watson  Research  Center  and  MIT  Operations  Research  Center 

AND 

By  Victor  CHERNOzHUKovt 

MIT  Department  of  Economics  and  Operations  Research  Center 

In  this  work  we  study  the  large  sample  properties  of  the  posterior- 
based  inference  in  the  curved  exponential  family  under  increasing  di- 
mension. The  curved  structure  arises  from  the  imposition  of  various 
restrictions,  such  as  moment  restrictions,  on  the  model,  and  plays  a 
fundamental  role  in  various  branches  of  data  analysis.  We  establish 
conditions  under  which  the  posterior  distribution  is  approximately 
normal,  which  in  turn  imphes  various  good  properties  of  estimation 
and  inference  procedures  based  on  the  posterior.  We  also  discuss  the 
multinomial  model  with  moment  restrictions,  that  arises  in  a  variety 
of  econometric  applications.  In  our  analysis,  both  the  parameter  di- 
mension and  the  number  of  moments  are  increasing  with  the  sample 
size. 

1.  Introduction.  The  main  motivation  for  this  paper  is  to  obtain  large 
sample  results  for  posterior  inference  in  the  curved  exponential  family  under 
increasing  dimension.  Recall  that  in  the  exponential  family,  the  log  of  a 
density  is  linear  in  parameters  9  £  G;  in  the  curved  exponential  family, 
these  parameters  9  are  restricted  to  lie  on  a  curve  ?/  h^  9{rj)  parameterized 
by  a  lower  dimensional  parameter  77  S  'I'.  There  are  many  classical  examples 
of  densities  that  fall  in  the  curved  exponential  family;  see  for  example  Efron 
[8],  Lehmann  and  Casella  [14],  and  Bandorff-Nielsen  [1].  Curved  exponential 
densities  have  also  been  extensively  used  in  applications  [8,  13].  An  example 
of  the  condition  that  puts  a  curved  structure  onto  an  exponential  family  is 
a  moment  restriction  of  the  type: 


[  m{x,a)f{x,9)dx  =  0, 
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that  restricts  9  to  lie  on  a  curve  that  can  be  parameterized  as  {0(??),  r]  S  \I'}, 
where  component  77  =  (a,  (3)  contains  a  and  other  parameters  (3  that  are  suf- 
ficient to  parameterize  all  parameters  0  €  0  that  solve  the  above  equation 
for  some  a.  In  econometric  applications,  often  moment  restrictions  repre- 
sent Euler  equations  that  result  from  the  data  x  being  an  outcome  of  an 
optimization  by  rational  decision-makers;  see  e.g.  Hansen  and  Singleton  [9], 
Chamberlain  [3],  Imbens  [11],  and  Donald,  Imbens  and  Newey  [5].  Thus,  the 
curved  exponential  framework  is  a  fundamental  complement  to  the  expo- 
nential framework,  at  least  in  certain  fields  of  data  analysis. 

Despite  the  importance  of  applications  to  high-dimensional  data,  theoret- 
ical properties  of  the  curved  exponential  family  are  not  as  well  understood 
as  the  corresponding  properties  of  the  exponential  family.  In  this  paper,  we 
contribute  to  the  theoretical  analysis  of  the  posterior  inference  in  curved 
exponential  families  of  high  dimension.  We  provide  sufScient  conditions  un- 
der which  consistency  and  asymptotic  normality  of  the  posterior  is  achieved 
when  both  the  dimension  of  the  parameter  space  and  the  sample  size  are 
increasing  i.e.  "large."  We  allow  for  weak  conditions  on  the  priors,  and  ex- 
pressly allow  for  improper  priors  in  particular.  We  also  study  the  conver- 
gence of  moments  and  the  precisions  with  which  we  can  estimate  them.  We 
then  apply  these  results  to  the  multinomial  model  with  moment  restric- 
tions, where  both  the  parameter  dimension  and  the  number  of  moments  are 
increasing  with  the  sample  size. 

The  present  analysis  of  the  posterior  inference  in  the  curved  exponential 
family  builds  upon  the  previous  work  of  Ghosal  [12]  who  studied  posterior 
inference  in  the  exponential  family  under  increasing  dimension.  Under  suf- 
ficient growth  restrictions  on  the  dimension  of  the  model,  Ghosal  showed 
that  the  posterior  distributions  concentrate  in  neighborhoods  of  the  true 
parameter  and  can  be  approximated  by  an  appropriate  normal  distribution. 
Ghosal's  analysis  extended  in  a  fundamental  way  the  classical  results  of 
Portnoy  [17]  for  maximum  likelihood  methods  for  the  exponential  family 
with  increasing  dimensions. 

In  addition  to  a  detailed  treatment  of  the  curved  exponential  family,  we 
also  establish  some  useful  results  for  exponential  families.  In  fact,  we  begin 
our  analysis  by  first  revisiting  Ghosal's  increasing  dimension  for  the  expo- 
nential family.  We  present  several  results  that  complement  Ghosal's  results 
in  several  ways:  First,  we  amend  the  conditions  on  priors  to  allow  for  a  larger 
set  of  priors,  for  example,  improper  priors;  second,  we  use  concentration  in- 
equalities for  log-concave  densities  to  sharpen  the  conditions  under  which 
the  normal  approximations  apply;  and  third,  we  show  that  the  approxima- 
tion of  Qf-th  order  moments  of  the  posterior  by  the  corresponding  moments 
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of  the  normal  density  becomes  exponentially  difficult  in  the  order  a. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  formally 
define  the  framework,  assumptions,  and  develop  results  for  the  exponential 
family.  In  Section  3,  the  main  section,  we  develop  the  results  for  the  curved 
exponential  family.  In  Section  4  we  apply  our  results  to  the  multinomial 
model  with  moment  restrictions.  Appendices  G,  D,  and  B  collect  proofs  of 
the  main  results  and  technical  lemmas. 

2.  Exponential  Family  Revisited.  Assume  that  we  are  have  a  trian- 
gular array  of  random  samples 


X, 

1) 

X, 

n)      -^(n)  ^ 

xr. 

Assume  further  that  each  \  X>^'  >  are  independent  ^'"^-dimensional  vec- 
tors draw  from  a  d'^^-dimensional  standard  exponential  family  whose  density 
is  defined  by  , 

(2.1)  /(.r;^'">)  =exp((.T,^("))-V.^"^(6'("^)), 

where  9^^'  €  0'"^  an  open  convex  set  of  IR  and  i/)*^"^  is  the  associate  nor- 
malizing function.  Let  9q  S  0'"^  denote  the  (sequence  of)  true  parameter 
which  is  assumed  to  be  bounded  away  from  the  boundary  of  0*"'  (uniformly 
in  n).  For  notational  convenience  we  will  suppress  the  superscript  (n)  but  is 
understood  that  the  associate  objects  are  changing  with  n. 

Under  this  framework,  the  posterior  density  of  0  given  the  observed  data 
{Xi}^-i  is  defined  as 

(2.2)  n„ie)  -  TT^e)  n  f{X,;  0)  =  n{9)exp  1(^2  X^,e\  -  nii9) 

where  tt  (=  n^'^'j  denotes  a  prior  distribution  on  0.  As  expected,  we  will 
need  to  impose  some  regularity  conditions  on  the  prior  n.  These  conditions 
differs  from  the  ones  imposed  in  [12].  Although  the  same  Lipschitz  condition 
is  required,  we  require  only  a  relative  lower  bound  on  the  value  of  the  prior 
on  the  true  parameter  instead  of  an  absolute  bound  (see  Theorem  1).  Finally, 
our  conditions  allow  for  improper  priors  in  opposition  to  [12].  In  fact,  the 
uninformative  prior  trivially  satisfies  our  assumptions. 
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Our  results  are  stated  in  terms  of  a  re-centered  gaussian  distribution  in 
the  local  parameter  space.  Let  fi  =  V'(^o)  and  F  =  ■tp"{6o)  be  the  mean 
and  covariance  matrix  associated  with  the  random  variables  {Xi},  and  let 
J  =  F^/^  be  its  square  root  (i.e.,  JJ'^  =  F).  The  re-centering  is  defined  as 
A„  :=  v^J-i  U  ^J'^i  Xi-  ^y,  it  follows  that  E\An]  =  0,  and  £'[A„A^]  is 
the  identity  matrix  of  appropriate  (increasing)  dimension  d.  Moreover,  the 
posterior  in  the  local  parameter  space  is  defined  for  u  e  U  =  ^/nJ{Q  —  9o) 


In  the  same  lines  of  Portnoy  [17]  and  Ghosal  [12],  conditions  on  the  growth 
rates  of  the  third  and  fourth  moments  are  required.  More  precisely,  the 
following  quantities  play  an  important  role  in  the  analysis: 


(2.4)  Bin{c)  =  sup  1^9  [{a,  Vf]  :  a  G  S'^-\  \\J{9  -  60) 


cd 
<  — 


(2.5)  ^        B2n{c)  =  sup  Ise  [{a,  V)"]  :  a  e  S'-\  \\J{9  -  9o)f  < 


cd 


where  V  is  a  random  variable  distributed  as  J~^{U  —  Eg[U])  and  U  has 
density  f{-\9)  as  defined  in  (2.1).  Moreover,  the  following  combination  of 
(2.4)  and  (2.5)  is  relevant 

(2.6).  A„(c):=M>/-5i„(0)  +  -52n(c) 


where  we  note  that  A„(c)  is  different  (in  fact  smaller)  than  the  one  defined 
in  [12].  This  quantity  is  used  to  bound  deviations  from  normality  of  the 
posterior  in  a  neighborhood  of  the  true  parameter. 
Next  we  state  the  main  results  of  this  work. 

Theorem  1   For  any  constant  c  >  0  suppose  that: 

(i)  Bin{c)^/d/n^O;  '         • 

{a)  A„(c)d  — »  0; 

{in)  \\F-'^\\d/n -^  0;  ■     ■ 

(iv)  the  prior  TT  density  satisfies:  supg  In  ^L  {  <  0{d),  and 

.        .  |ln7r(0)-ln^(^o)|</^n(c)|i^-^o|| 
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for  any  e  such  that  ||6'-6'o||  <  y/\\F~^\\cd/n.  We  require  Kn{c)y/\\F-^\\cd/n  - 
0. 

Then  we  have  asym.ptotic  norm.ality  of  the  posterior  density  function,  that 

\K{'^)  -  (Pd{u;  A„,  Id)\du  -^p  0. 

As  mentioned  earlier,  Theorem  1  has  different  assumptions  on  the  prior 
that  Theorem  3  of  [12]  has.  On  the  other  hand,  it  does  not  requires  additional 
technical  assumptions  used  in  [12],  as  discussed  in  Section  B,  and  the  growth 
condition  of  d  with  relative  to  the  sample  size  n  is  improved  by  Ind  factors. 

In  some  applications  it  might  be  desired  to  have  stronger  convergence 
properties  than  simplj'  asymptotic  normality.  The  following  theorem  pro- 
vides sufficient  conditions  for  the  Q-moment  convergence. 

Theorem  2  For  some  sequence  of  {a}  and  d  —>  oo,  let 

a  \n{d  +  a) ' 


Md,a  :=  {d  +  a)n  + 

Suppose  that  the  following  strengthening  of  assum.ptions  [ii)  and  (iv)  hoi 
for  any  fixed  c: 

{ii')  Xn  {cMd,a/d)  IcMdJ  ''^"^"  ^  0;        -      •  '  . 


/■    >\    l^     (-1,^        IJ\  J\\F-'\\[cMd.c]''"' 

{iv')  Kn  {cMd,a/d)  V —;r^ — 


0. 


Then  we  have 


(2.7)  |||u|n<(u)  -(^d(^;A„,Jd)|ciu^p 


0. 


We  emphasize  that  Theorem  2  allows  for  a  and  d  to  grow  as  the  sample 
size  increases.  Our  conditions  highlight  the  polynomial  trade  off  between  n 
and  d  but  an  exponential  trade  off  between  n  and  a.  This  suggests  that  the 
estimation  of  higher  moments  in  increasing  dimensions  applications  could 
be  very  deUcate.  Conditions  (ii')  and  {iv')  simpUfy  significantly  if  a  =  o{d), 
in  such  case  we  have  M^^a  ~  d- 

■  As  an  illustrative  example  consider  the  multinomial  distribution  appli- 
cation analyzed  by  Ghosal  in  [12].  Let  X  =  {a;°,x\  . . .  ,x'^}  be  the  known 
finite  support  of  a  multinomial  random  variable  X  where  d  is  allowed  to 
grow  with  sample  size  n.  For  each  i  denote  by  Pi  the  probability  of  the  event 
{X  =  x'}  which  is  assumed  to  satisfy  max,  l/pi  =  0{d).  The  parameter 
space  is  given  by  9  =  {9i,.  ..,9d)  where  9i  =  log(pi/(l  -  Yl'j=iPj))  (under 
the  assumption  on  the  pj's  the  true  value  of  9j's  is  bounded).  The  Fisher 
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information  matrix  is  given  by  F  =  P  —  pp'  where  P  =  diag(p).  Using  a 
rank-one  update  formula,  we  have 

(2.8)      F-'=p-'+r''szi=p- 


l-p'P-^p  l-po' 

Therefore  we  have  \\F-^\\  <  trace(F"-i)  =  Eti  ^  +  t:$^  <  0{(f).  It  is  also 
possible  to  derive  an  expression  for  J  =  F^/^  and  its  inverse 

„,-,/p-l/2  p-l„„/p-l/2 

j  =  pi/2 PP/  and    J-V2  =  p-i/2+  ^     PP^ 


1  +  yi  -p'p-^p  1  -p'p-^p+  yi  -p'p-^p 

In  order  to  bound  A„  we  need  to  bound  the  third  and  fourth  moments 
of  a  random  variable  which  define  B\n  and  B2n-  Let  o  £  5"~^  and  q  be 
distributed  as  f{-;0).  We  have  that 

d  ,  d 

(a,J-\q-p))  =  ^a2Pj"^''^(9i-Pj)  +  — ---=XIPi''^9'-P')- 

Under  the  assumption  on  the  pt  it  can  be  shown  that  i?i„(c)  =  0((i^'")  and 

S2„(C)  =  0(p2). 

The  relations  above  were  derived  by  Ghosal  in  [12],  where  the  growth  con- 
dition that  d^  {hi.  d)/n  — >  0  was  imposed  to  obtain  the  asymptotic  normality 
results  (the  case  of  a  =  0).  We  relax  this  growth  requirement  by  combining 
Ghosal's  approach  with  our  analysis  and  an  uninformative  (improper)  prior. 
In  this  case  we  have  Kn{c)  =  0  and  our  definition  of  A„  remove  the  loga- 
rithmic factors.  Therefore,  Theorem  1  leads  to  a  weaker  growth  condition 
in  that  it  only  requires  that  d'^/n  — >  0.  Moreover,  the  results  of  Theorem 
2.4  of  [12]  now  follow  under  the  weaker  growth  condition  that  d^/n  -^  0, 
replacing  the  previous  growth  condition  that  d^{logd)/n  -^  0.  For  higher 
moment  estimation  (a  >  0),  the  conditions  of  Theorem  2  are  satisfied  with 
the  condition  that  d'^'^'-'^^ /n  — >  0  for  any  strictly  positive  value  of  6. 

Suppose  next  that  we  are  interested  to  allow  a  grow  with  the  sample 
size  as  well.  If  d  is  growing  in  a  polynomial  rate  with  respect  to  n,  our 
results  do  not  allow  for  a  =  O(lnn).  Some  limitation  along  these  lines 
should  be  expected  since  there  is  an  exponential  trade  off  between  a  and  n. 
However,  it  is  definitely  possible  to  let  both  the  dimension  and  a  to  grow 
with  the  sample  size  if,  for  instance,  we  are  wilhng  to  accept  the  condition 
that  a  =  O(Vlnn).  Such  slow  growth  conditions  illustrate  the  potential 
limitations  for  the  practical  estimation  of  higher  order  moments. 
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3.  Curved  Exponential  Family.  Next  we  consider  the  case  of  a  curved 
exponential  family.  Being  a  generalization  of  the  canonical  exponential  fam- 
ily, its  analysis  has  many  similarities  with  the  previous  setup. 

Let  Xi,X2, . . .  ,Xn  be  iid  observations  from  a  d-dimensional  curved  ex- 
ponential family  whose  density  function  is  given  by 

/(x;e)=exp((x,^(77))-V^„(e(77))), 

where  77  e  ^  c  IR'^S  61  :  \&  — >  0,  an  open  subset  of  IR'^,  and  d  — >  00  as 
n  — *  00  as  before.  In  this  section  we  assume  that  J  ~  Id  for  notational 
convenience. 

The  parameter  of  interest  is  r],  whose  true  value  r?o  lies  in  the  interior  of 
the  set  ^  C  H*^^.  The  true  value  of  6  induced  by  770  is  given  by  9q  =  9{r]o). 
The  mapping  77 1— >  ^(77)  takes  values  from  JR^^  to  IR''  where  c-d  <  di  <  d,  for 
some  c  >  0.  Moreover,  assume  that  770  is  the  unique  solution  to  the  system 

e{v)  =  Oo- 

Thus,  the  parameter  9  corresponds  to  a  high-dimensional  linear  parametriza- 
tion  of  the  log-density,  and  77  describes  the  lower-dimensional  parametriza- 
tion  of  the  density  of  interest.  We  require  the  following  regularity  conditions 
on  the  mapping  9{-). 

Assumption  A.  For  every  k,  and  uniformly  in  7  6  B{0,K\/d),  there 
exists  a  linear  operator  G  :  M*  — >  IR    such  that  G'G  has  eigenvalues 
,  [      bounded  from  above  and  away  from  zero,  and  for  every  n 

-■'■'      (3.9)  v^(9(77o+7/y^)-^(%))=n„  +  (/  +  i?2n)G7, 

where  ||ri„||  <  5i„  and  ||/?2n||  <  So-n-  Moreover,  those  coefficients  are 
such  that 

''■      (3.10)  Sind^/^-^0   and    Sond -^  0-       \: '' 

Assumption  B.  There  exist  a  strictly  positive  constants  £0  such  that 
for  every  77  e  \I/  (uniformly  on  n)  we  have 

.'     (3.11)  mi)-0{m)\\>e,\\ri-ml 

Thus  the  mapping  77 1^  ^(77)  is  allowed  to  be  nonlinear  and  discontinuous. 
For  example,  the  additional  condition  of  5i„  =  0  implies  the  continuity  of 
the  mapping  in  a  neighborhood  of  770.  More  generally,  condition  (3.10)  does 
impose  that  the  map  admits  an  approximate  linearization  in  the  neighbor- 
hood of  770  whose  quality  is  controlled  by  the  errors  (5i„  and  52n-  An  example 
of  a  kind  of  map  allowed  in  this  framework  is  given  in  the  figure. 
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Fig  1.  This  figure  illustrates  the  mapping  6{-).  The  (discontinuous)  solid  line  is  the  map- 
ping while  the  dash  line  represents  the  linear  map  induced  by  G.  The  dash-dot  line  repre- 
sents the  deviation  band  controlled  by  ri„  and  i?2n- 


Again,  given  a  prior  n  on  0,  the  posterior  of  rj  given  the  data  is  denoted 


by 


^„(r?)  a  n{9i7]))  ■  J]  fixf,  v)  =  Ad{v))  ■  exp  {n  (X,  0(r/))  -  n^(0(r/))) 


where  X  =  iE"=i^i- 

Under  this  framework,  we  also  define  the  local  parameter  space  to  describe 
contiguous  deviations  from  the  true  parameter  as 

7  =  v^(T/-r7o),    and  let   s  =  (G'G)-^G'v^(X  -  ^) 

(once  more)  be  a  first  order  approximation  to  the  normalized  maximum 
liklelihood/extremum  estimate.  Again,  similar  bounds  hold  for  s:  E[s]  =  0, 
E[ss'^]  =  (G'G)~\  and  ||s||  =  Op{\/d).  The  posterior  density  of  7  over  F, 

where  r  =  x/n(^ —  %),  is  TT* (7)  = -r^2l_^    where  ^       '■ 

"  j^e(-t)d-y 


(3.12) 


^(7)  =  exp  [n  (^X,e{r,o  +  n'^'j)  -  e{vo))  -  n  [^(^(770  +  n-^/^7))  -  V'(e(r?o)) 
By  construction  we  have 
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where  u-y  e  W  C  H''. 

Next  we  first  show  that  tails  have  small  mass  outside  a  vrf-neighborhood 
in  r.  We  also  need  an  additional  condition  on  a„  as  defined  in  (B.18)  and 
repeated  here  for  the  reader's  convenience 

a„  =  sup{c  :  A„(c)  <  1/16}. 

Therefore,  using  Lemma  2,  in  a  neighborhood  of  size  \Jand  we  can  still  bound 
Z„  by  above  with  a  proper  gaussian.  In  the  next  lemma  it  is  required  that 
logd  =  o{ani  which  is  a  substantially  weaker  condition  than  the  one  used 
in  [12]  for  establishing  asymptotic  normality  for  the  posterior  of  (regular) 
exponential  densities,  An(clogd)(i  =  o(l). 

Lemma  1  Assume  that  (i),(ii),  (Hi),  and  (iv)  hold.  In  addition,  suppose 
that  \ogd  =  o{an).  Then,  for  some  constant  k  independent  of  d  and  d\,  we 
have 

I  '     ■K(e[r,o)  +  n-'^''-u^)z„(u-,.)d'y<o{  f  ■n(e(rn,)  +  n-^/-u-,)z,{u^)d^\ 

Jr\B{o,kVd)     ^  '  \h     ^  >  J 

Comment  3.1  The  only  assumption  made  on  d\  in  the  previous  lemma 
was  that  di  <  d.  If  di  logd  =  o[d)  the  proof  simplifies  significantly  (there  is 
no  need  to  define  region  (II)). 

Next  we  address  the  consistency  question  for  the  maximum  likelihood 
estimator  associated  with  the  curved  exponential  family. 

Theorem  3  In  addition  to  Assumptions  A  and  B,  suppose  that  an  — >  oo, 
and  (iv)  hold.  Then  the  maximum  likelihood  estimator  t)  satisfies 

\\v-m\\  =  0{^). 

Two  remarks  regarding  Theorem  3  are  worth  mention.  First,  a  sufficient 
condition  for  a„  — »  cxd  is  simply  An(c)  — >  0,  stronger  than  the  condition 
y^d/nBin{c)  needed  for  consistency  for  the  exponential  case  obtained  by 
Ghosal  in  [12].  Second,  our  consistency  result  relies  on  the  dimension  of  the 
larger  model  d. 

Finally,  we  can  state  the  asymptotic  normality  result  for  the  curved  ex- 
ponential family. 

Theorem  4  Suppose  that  Assumptions  A,  B,  (i),  (ii),  (Hi),  and  (iv)  hold. 
In  addition,  suppose  that  logd  =  o{an)-  Then,  asym.ptotic  norm.ality  for  the 
posterior  density  associated  with  the  curved  exponential  family  holds, 

\Kii)-'f'd,ir,s,iG'G)-')\dj. 
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4.  Multinomial  Model  with  Moment  Restrictions.  In  this  section 
we  provide  a  iiigli-level  discussion  of  tlie  multinomial  model  with  moment 
restrictions.  Let  X  =  {x^,x^,x^, . . .  ,1''}  be  the  known  finite  support  of  a 
multinomial  random  variable  X  which  was  described  in  Section  2.  Conditions 
(i)  —  (iv)  were  verified  in  the  same  section. 

As  discussed  in  the  introduction,  it  is  of  interest  to  incorporate  moment 
restrictions  into  this  model,  see  Imbens  [11]  for  a  discussion.  This  will  lead 
to  a  curved  exponential  model  as  studied  in  Section  3. 

The  parameter  of  interest  is  77  G  ^  C  IR^  a  compact  set.  Consider  a  (twice 
continuously  differentiable)  vector-valued  moment  function  m  :  X  x  ^  ^> 
IR^'^  such  that 

E[m(X,  7])]  =  0    for  a  unique  %  £  "I'.  ' 

The  case  of  interest  consists  of  the  cardinality  d  of  the  support  X  being 
larger  than  the  number  of  moment  conditions  M  which  in  turn  is  larger 
than  the  dimension  di  of  the  parameter  of  interest  77.  The  log-likelihood 
function  associated  with  this  model 

K9,r7)  =  EE/{X.  =  .x,}ln9,  ■ 

(4.13)  -^^=°  ,  ^,  ,         ■, 

for  some  6  and  77  such  that  y^gj77z( 2:^,77)  =  0,  ^gj  =  1. 

j=0  j=0 

and  l{q,ri)  =  —00  if  violates  any  of  the  moments  conditions.  This  log- 
likelihood  function  induces  the  mapping  <?  :  ^  — >  A''"^  formally  defined 


^(r?)  =  arg  max    l{q,ri) 


(4.14) 


Y^  qjVi(Xj,ri)  =0,  J2qj  =  1,  g  >  0. 
j=o  j=l 


As  discussed  in  Section  2  the  function  9j{ri)  =  log((7j(77)/go(^))  (for  j  = 
l,...,(i)  is  the  natural  9{-)  :  ^  —>  Q  mapping.  Assuming  that  the  matrix 
E  [?7?.(X,7])m(X,  7/)']  is  uniformly  positive  definite  over  77,  Qin  and  Lawless 
[18]  use  the  inverse  function  theorem  to  show  that  ^  is  a  twice  continuous 
differentiable  mapping  of  77  in  a  neighborhood  of  770.  In  particular  this  implies 
that  Assumption  A  holds  with  S2n  =  0  and  (5i„  =  0(dd'i{d/n)].  It  suffices 
to  have  d*-^/n  -^  0. 

In  order  to  verify  Assumption  B,  we  use  that  the  parameter  77  belongs  in 
a  compact  set  ^,  and  assume  that  the  mapping  is  injective  (over  a  set  that 
contains  "^  in  its  interior).  We  refer  to  Newey  and  McFadden  [16]  for  a  dis- 
cussion of  primitive  assumptions  for  identification  with  moment  restrictions. 
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APPENDIX  A:  NOTATION 

For  a,6  e  IR*^,  their  (Euclidean)  inner  product  is  denoted  l^y  {a,b),  and 
||a||  =  \/{a,a).  The  unit  sphere  in  R''  is  denoted  by  5"^"^  =  {v  e  W^  : 
||u||  =  1}.  For  a  linear  operator  A,  the  operator  norm  is  denoted  by  j|.4||  = 
sup{||Aa||  :  ||a||  =  1}.  Let  (j)d{-:,  fJ-;V)  denote  the  rf-dimensional  gaussian 
density  function  with  mean  /;  and  covariance  matrix  V. 

APPENDIX  B:  TECHNICAL  RESULTS 

In  this  section  we  prove  the  technical  lemmas  needed  to  prove  our  main 
result  in  the  following  section.  Our  exposition  follows  the  work  of  Ghosal 
[12].  For  the  sake  of  completeness  we  include  Proposition  1,  which  can  be 
found  in  Portnoy  [17],  and  a  specialized  version  of  Lemma  1  of  Ghosal  [12]. 
All  the  remaining  proofs  use  different  techniques  and  weaker  assumptions 
which  leads  to  a  sharper  analysis.  In  particular,  we  no  longer  require  the 
prior  to  be  proper,  no  bounds  on  the  growth  of  det  {ip" {do))  are  imposed, 
and  Inn  and  \nd  do  not  need  to  be  of  the  same  order. 

As  mentioned  earlier  we  follow  the  notation  in  Ghosal  [12],  for  u  G  W  let 

(B.15)  Z„(u)  =exp('(u,A„)  -  IJuJlVz)      and 

(B.16) 
Zn{u)  =  exp  ( -^  /f^  A'„  J-iu\  ~n[i>  (^o  +  n-'/-J-\)  -  ^.(('o)]  ]  , 

otherwise  (if  ^o  +  n~^^-J~^u  ^  0),  let  Z„(u)  =  Z„(u)  =  0.  The  quantity 
(B.15)  denotes  the  likelihood  ratio  associated  with  /  as  a  function  of  u.  In 
a  parallel  manner,  (B.16)  is  associated  with  a  standard  gaussian  density. 

We  start  recalling  a  result  on  the  Taylor  expansion  of  ^  which  is  key  to 
control  deviations  between  Z{u)  and  Z{u). 

Proposition  1  (Portnoy  [17])   Let  ip'  andip"  denote  respectively  the  gra- 
dient and  the  Hessian  ofip.  For  any  9,  6q  e  Q,  there  exists  8  =  X9  +  {l  —  X)9o, 
for  some  X  G  [0, 1],  such  that 
(B.17) 

i-{0)    =    ^{9o)  +  {i''(eo).e--eo)  +  ^{e-9o,i'"{do){0-0o))  + 

+     lEgA{9-eo,Uf]+4-AEg\{9-eo,U)']-3(EJ{0-eo,Uf] 


where  Eg  [g{U)]  denotes  the  expectation  of  g{V  —  E  \V])  and  V  ~  f{'\9). 

Based  on  Proposition  1  we  control  the  pointwise  deviation  between  Z-n  and 
Zn  in  a  neighborhood  of  zero  (i.e.,  in  a  neighborhood  of  the  true  parameter). 
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Lemma  2  (Essentially  in  Ghosal  [12]  or  Portnoy  [17])  For  allu  such 
that  \\u\\  <  \fcd,  we  have 

|lnZ„(u)-lnZ„(u)|  <  A„(c)||uf  and  lnZ„(u)  <  (A„,u)-i||u||2(l-2A„(c)). 

Proof.  Under  our  definitions,  (/)  =  |lnZ„(u)  —  lnZ„(u)|  =  7i|'0(^o  + 
vr'^/~J~'^u)  —  ip{9o)\.  Using  Proposition  1  we  have  that  (/)  is  bounded  above 
by 

(/)     <     n\lE[{^,vf]+^,[E,\{^,V)'-^{E,[{^,vf]) 
<      '^{n'^/^\\ufB,„{0)  +  n-'\\u\\*B2n{c))<Xn{c)\\uf. 

Tlie  second  inequality  follows  directly  from  the  first  result.    ■ 

Next  we  show  how  to  bound  the  integrated  deviation  between  the  quan- 
tities in  (B.15)  and  (B.16)  restricted  to  the  neighborhood  of  zero. 

Lemma  3  For  any  c>  0  we  have 

(  I  Zn{u)dv}\       I  \Zn{u)  -  Zn{u)\du  <  cd\n{c)e'"^^^'^''^ 

\J  J  J{u:\\u\\<V^] 

Proof.  Using  |e^  — e^j  <  [x  — y|niax{e^,e^}  and  Lemma  2  (since  ||u||  <  \/cd) 
we  have 

\Z^{u)  -  Zn{u)\     <     |lnZ„(u)  -lnZ„(«)|exp  ((A„,u)  -  i(l  ^  Xn{c))\\uf) 
<    A„(c)||u||2exp((A„,u)-i(l-A„(c))||u||2). 

By  integrating  over  the  set  H{vcd)  =  {u  :  ||u||  <  Vcd}  we  obtain 

/H(^/^5)  \Zn{u)  -  Zn{u)\du     <  /„,^j  A„(c)||t.|p exp  ((A„,u)  -  1(1  -  A„(c))||«|p 

■      :  <  cdA„(c)/^(^,exp((A„„«)-i(l-A„(c))||«f) 

.       ■;■.  <  cdA„(c)e<='*^"(=)/^^^)exp((A„,^.)-|!|u|r-) 

<  cdA„(c)e='^-^"(=''/Z„(u)dM. 


The  next  lemma  controls  the  tail  of  Zn  relatively  to  Zn-  In  order  to  achieve 
that  it  makes  use  of  a  concentration  inequality  for  log-concave  densities  func- 
tions developed  by  Lovasz  and  Vempala  in  [15].  The  lemma  is  stated  with  a 
given  bound  on  the  norm  of  A„  which  is  allowed  to  grow  with  the  dimension. 
Such  bound  on  A„  can  be  easily  obtained  with  probability  arbitrary  close 
to  one  by  standard  arguments. 
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Lemma  4  Suppose  that  ||An,p  <  Cid  and  A„(c)  <  1/16.  Then  for  every 
k  >  1  we  have 


I 


n{eo+n~^^^J-^u)Zn{u)du<  (^supTv{u))  Te^''^''*^)   /"  Z„(u)dM  j  e^'^i' 


{u:\\u\\>kV^} 

w/tereo  16max{4Ci,  1/(1 -2A„(c))}.  ■■-;"''■     ''    ' 

Proof.  Define  H{a)  :=  {u  :  \\u\\  <  a}  and  its  complement  by  H{aY.  Then 
we  have 

7r(^o+n""^''V"'M)Z„(u)dw<       sup      7r(eo+n"^/2j-^u)  /  Z„,(u)du. 

H(kV3r  H(fc\/S)=  JHikV^Y 

Next  note  that  A^  G  H{kvcd).  Moreover,  for  any  u  6  H{k\/cd)'^  we  have 
some  u  :=  /cv^w/|juj|  such  that 

InZniu)     <     (A„,u)-i||uf(l-2A„(c)) 

<  fc\/^v'C^-|fc2cd(l-2A„(c)) 

<  -/cd(i-=%^/cc- v'Hcr) 

Under  our  assumptions  we  Iiave 


2  "     V  ~      \2 

since  c  >  16  and  assuming  A„(c)  <  1/4.  Using  Lemma  5.16  of  [15]  we  have 

SHihVTdY  Zn{u)du    <     {e^-^cf-'^  J  Zn{u)du 

<  2(ei-"c)^-i//f(v^)^n(«)du 

<  2(ei-^c)^-ie'=P^''W/^(^)Z„(w)du 

where  we  used  that  J  Zn{u)du  <  2/^/  /^>  Zn{u)du  (note  that  k  does  not 
appear). 

Since  c  >  16,  we  have  77i  :=  c  Ml  —  l/d)g  —  A„(c)  j  —  1  (since  we  expect 
A„(c)  ^  0  and  l/d  ^  0,  7?2  -»  f  -  1  >  f).   ■ 

We  note  that  the  value  of  c  in  the  previous  lemma  could  depend  on  n  as 
long  as  the  condition  is  satisfied.  In  fact,  we  can  have  c  as  large  as 

(B.18)  a„  :=sup{c:  A„(c)  <  1/16}. 

(B.18)  characterizes  a  neighborhood  of  size  ^/and  on  which  the  quantity 
Zn{-)  can  still  be  bounded  by  a  proper  gaussian.  Lemma  4  bounds  the  con- 
tribution outside  this  neighborhood.  We  close  this  section  with  a  technical 
lemma  for  bounding  the  difference  between  the  expectation  of  a  function 
with  respect  to  two  probability  densities. 
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Lemma  5  Let  f  and  g  be  two  nonnegative  integrable  functions  in  IR  ,  and 
define  If  =  J  f{u)du  and  Ig  =  J  g{u)du.  Moreover,  let  h  be  a  third  positive 
function  and  A  C  H'^  be  a  set  such  that  hf  =  /^  h{u)f{u)du  <  co.  Then 


h{u) 


f{u)      g{u) 


m&XueA  h{u)  +  hf  If    ,    ,  ,,    ,         ,   ... 

du  < '- — '-  I    \f[u)  -  g[u)\du. 

A 


Proof.  Simply  note  that 


/M_sM|d„    =    /^/,(„)|/M_/M  +  /li)_i£M|du 


7  -  ^1  /a  Ku)f{u)du  +}-J^  h{u)\f{u)  -  g{u)\du 
'^Y-I^^i-Jj,h{u)\J{u)-g{u)\du 
'^'■--^'i:'^''^''  f^\f{u)-g{u)\du. 


APPENDIX  C:  PROOF  OF  THEOREMS  1  AND  2 

Armed  with  Lemmas  2,  3,  4,  and  5,  we  now  show  asymptotic  normahty 
and  moments  convergence  results  (respectively  Theorems  1  and  2)  under 
the  appropriate  growth  conditions  of  the  dimension  of  the  parameter  space 
with  respect  to  the  sample  size. 

It  is  easy  to  see  that  Theorem  1  follows  from  Theorem  2  with  a  =  0, 
therefore  its  proof  is  omitted. 

ah\{d  +  q)' 


Proof  of  Theorem  2.  Let  M^^a 


[d  +  a)  (l  + 


d  +  a 


In  the  case  that 


a  is  constant  and  d  grows  to  infinity,  this  simplifies  to  a  multiple  of  d.  We  will 
be  using  that  y^SM^  >  4||A„||  in  the  analysis  (recall  that  ||A„||  =  0{Vd)). 
We  will  divide  the  integral  of  (2.7)  in  two  regions 


A  =  {u  €  IR'^  :  ||u||  <  ^cMd.,a}     and    A^ 
where  c  is  a  fixed  constant.  Thus  we  have 


(C.19) 


||u||"]7r*(u)  -  (pd{u;  An,  Id)\du    <    J^  |iu||"|7r*(u)  -  cpdiu;  An,Id)\du  + 
,  +     J^.\\ur\K{u)-Mu;^n,Id)\du. 


To  bound  the  first  term,  we  will  use  Lemma  5  with  h{u)  =  ||u||°  and  A 
as  defined  above.  In  this  case,  we  have  hf  / If  <  max^eA  h{u) 


-c-'^mS. 
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Therefore 

/a  ii"ri<( 


2c'r-Mli'  n-i/2j-iu)z„(u)  -  n(eo)Zr,{u)\dv 

I  7r{eo)Z„{u)du  ■'/^'     ^    "  J     n\    I  V    u/     nv    ;i 


7^(60) 


+        ;■  ^    ,    fT    /a  l^n(w)  -  Zn{u)\du 


'"-''  I    JZn(u)du 


+ 


To  bound  the  very  last  term  we  apply  Lemma  3  with  Cd^a  =  cMd^a/d  to 
obtain 


J  Zn{u)du  Jk 

which  converges  to  zero  under  our  assumption  [ii'). 

On  the  other  hand,  the  first  term  is  bounded  by  assumption  [iv').  More- 
over, assumption  {iv')  also  ensures  that  the  term  converges  to  zero  as  follows 


n-Q/2  l/rO:/2 

2c°''-M^^  sup 


9o  +  n-i/2j-iu) 


^(^0) 


-  1 


/a  Zn{u)du 


< 


2c«/2i\//"''^e'^'*''<'''>^"('^'''°' 


Kn{ci,^y 


J  Zn{u)du 

-1-^0. 


The  second  term  of  (C.19)  is  bounded  above  by 


^  ^    '\u\\''Z„(u)du+- 1 /     ||ii||"7r(6'o+n-'/-J-^u)Z„(u)du. 


J  Zn(u)du  J  A 


JniOo)Z^{u)duJA 


The  first  term  above  converges  to  zero  by  standard  bounds  on  gaussian 
densities  for  an  appropriate  choice  of  the  constant  c  (note  that  c  can  be 
chosen  independently  of  d  and  a). 

Finally,  we  bound  the  last  term.  Let  A^  :=  [u  :  \\u\\  €  [k ^cM^^a ,  (^  +  l)\/c7Wd^  }. 
Thus  we  have 
(C.20) 

/    \\u\\''n{eQ+n-^^^J-\:}Z„(u)du<'S2ik+'^)°'^'^^^^'^da    f    Tr{eo+n-^/~J-hi)Z,,iu)du 
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Using  Lemma  4  for  each  integral  we  have 
f    Tx(ea+n-^'^J-\i)Zn{u)du<  (^sup7r(u))  ("e^^''^- '''"('="■"'  j  Zn{ii)chu\  e''''^'^-''''/^ . 

Since  M^^q  >  max{l,Q}  we  have  ' 

oo 

7:=i 

by  choosing  c  large  enough.  Moreover,  our  definition  of  M^^a.  also  implies 
that 

c"/2M°/2e-cM,,,/io  ^  g^p  I'^^i^  g ^  1^^  j,^_^^)  _  cM,  jio")  <  exp  (-cA/rd,„/20) 


provided  that  c  is  large  enough. 

We  have  that  (C.20)  can  be  bounded  above  by 

(sup^(u))  U^^^'-^-i^'--')  I  Zn{u)dv\  e-'^^o--'^^  =  o  (n{eo)  I  Zniu)du\ 

under  our  assumptions.  Therefore  the  result  follows.  D 

APPENDIX  D:  PROOFS  OF  SECTION  3 
Proof  of  Lemma  1.    Divide  F  into  three  regions: 

(/)  :=  B  (0,  hNo)  ,    {II)  :=  {7  :  max{||7||,  ||t/^-,||}  <  hNr]  \  B  (0,  kNc)  , 

(///):=  r  \  ((J)  U  (//)), 

where  k  is  chosen  later  to  be  large  enough  independent  of  the  dimensions  d 
or  di.  Region  (/)  is  defined  to  be  the  region  where  the  linear  approximation 
G  for  9{-)  is  valid  in  the  sense  of  Assumption  A.  Region  (///)  represents 
the  tail  of  the  distribution;  either  7  or  Uj  has  large  norm.  Finally,  region 
(/J)  is  an  intermediary  region  for  which  G  is  not  a  valid  approximation  but 
we  still  have  interesting  guarantees  for  deviations  from  normality.  We  point 
out  that  regions  (//)  or  (///)  might  be  highly  non-convex.  We  will  derive 
sufficient  conditions  on  the  values  of  Nq  and  A''^  as  a  function  of  the  d  and 
di .  It  will  be  sufficient  to  set  Ng  =  \fd  and  Nt  =  \fd  log  d. 

For  notational  convenience  we  define  cq  =  k^N^/d  and  ct  =  k~Nf/d. 
Our  assumptions  are  such  that 

dXnicc)  -^  0    and    A„(cr)  <  1/16. 
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We  first  bound  the  contribution  of  region  (///)•  For  any  7  e  {III),  define 
u^  =  kNc-^'-n  €  U.  Using  Lemma  2  we  have 

lnZ„(u^)  <  (A„,w^:  -  ^(1 -2A„(cG))||u-yf 

Since  InZn(-)  is  concave  in  U  and  hrZ„(0)  =  0  by  design,  we  have 
(D,21) 

\nZn{u,)  <  MinZ„(K,)  <  -\]u,\\Ng  (^LllM^^lp^  <  -\\u,\\Ng  y 


5' 
1/16.  The  contribution  of  (///)  can  be  bounded  by 


J {II I)     ^  '  \BeeT^[f^o)J  J[iii)         V     5  J 

The  integral  on  tlie  right  can  be  bounded  as  follows 
/(///)  e-^p(-T^G||w7ll)d7     <     /B(o,fc/v^)n(/7/)exp(-f  iVG||u^||)(i7 

By  definition  of  (///),  7  £  B{0,Nt)  D  (III)  imphes  ||u^||  >  kMr-  On  the 
other  hand  7  G  B{0,Nt)  implies  that  \\uj\\  >  EoNt-  A  standard  bound  on 
the  integrals  yields 


/(///)  exp  {-'^Ng\\u4)  dj     <     exp  (-'^NgNt  +  di  ln{kNT)) 
+    exp(-f^eoNGNT  +  dilndi)  . 


Using  the  assumption  on  the  prior,  we  can  bound  the  contribution  of 
(///)  by 

/  _  p 

(D.22)  7r(6io)exp    Cprrord  +  dj  Indj  +  di  lu{kNT)  ~  -j-o^gNt 

Next  consider  7  e  {II).  By  definition  7  G  B{0,  kNr)  \  B{0,  kNc)-  Under 
the  assumption  that  Xn{cT)  <  1/16,  we  have  that 

17 

lnZ„(u^)  <  (A„,u-y)  - --||u^f . 

Therefore,  by  choosing  k  such  that  kNG  >  8||A„||,  we  have 

n(0{'n,)  +  n-'^'u,)zUn,)d-f     <     n{eo)(sup^)f      exp  ( {An,v.^)  -  ^-hu^f)  d-y 
11)     ^  '  \eee'!^(t^o)J  Jill)         \  26  J 


-  "^'°^'^^g^(v'/    '^P 


Ilu-vll"  I  d"/ 

2  16"   ^"  / 
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Again  using  our  assumption  on  tlie  prior  and  standard  bounds  to  gaussian 
densities,  we  can  bound  the  contribution  of  (//)  by 

(D.23)  Tr{9o)exp{cpriord+d,  ln(l/£o)  -  y^-'^c) 

Finally,  we  show  a  lower  bound  on  the  integral  over  (/).  First  note  that 
for  any  7  G  (/)  condition  (3.10)  holds  and  we  have  u-y  =  ri„  +  (/  +  R2n)Gl- 
Therefore,  u^  £  BiO,{\\G\\  + 5n2)kNG  +  5ni)  C  B{0,2\\G\\kNG)-  For  simplic- 


ity,  letC(;)=4|lGf7V2/d 


I{i)'^{^ivo)  +  n   ^/-UyjZn{u~,)d-i    >    7r(6'o)exp  f-/\„(c(/))y'^j /(;)Z„(u^)d7 


Under  our  assumptions  exp  I  ~-'^'n(c{/))\/ ~^  )    ^^   ■'■•  Furthermore,  using 
(3.11),  |iA„||  =  0{Vd),  and  II7II  <  kNa,  we  have 

lnZ„(ri„  +  (/  +  i?2„)G7)     =     (A„,ri„  +  (/ + /?2n)G7) - 

-'-^^^4^\\r:n  +  {I  +  R2n)Gyf  -    .' 

>     o(l)  +  (A„,G7)-^i^^#^|!G7f. 
Therefore  we  have 

/(O  n  (^('?o)  +  n-'^hi,)  Zn{u^)dy     >     -K{e,)0  (/,,)  exp  ((A„,  G7)  -  '-^^^.\\G^\\-)  d^) 

>  7r(eo)0((l-2A„(c(;)))^'/2det(G'G)-^/2) 

>  7t{9o)0  {expH\G\\di)) . 

Choosing  Ng  =  Vd,  A^i  =  ^\ogd  and  k  sufficiently  large  the  result 
follows  since  we  have  d  >  di.   u 

Proof  of  Theorem  3.  Let  7  be  such  that  77  =  ?]o  +  n^^"^-  We  will  show 
that  lnZ{u-y)  <  ~cd  for  any  7  ^  B{Q,ky/d)  where  k  is  sufficiently  large. 
Therefore,  since  the  contribution  of  the  prior  is  bounded  by  (iv),  the  MLE 
7  e  S(0,  kVd)  and  the  result  follows.  '.    , 

Using  (D.21)  with  No  =  Vd  we  have 

In Z{u^)  <-\\uj\\VdP/5<-eoPd/5. 

As  stated  earlier,  the  result  follows  by  choosing  k  sufficiently  large.    ■ 
Proof  of  Theorem  4.    Using  Lemma  1  and  known  results  for  gaussian 

densities,  we  can  restrict  our  analj'sis  to  B{0,  kVd)  since  the  remaining  part 

has  negligible  mass. 

The  remaining  of  the  proof  follows  the  same  steps  in  the  proof  of  Theorem 

2. 
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